Refine your search
Collections
Co-Authors
Journals
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Garg, Urvashi
- Maulik: A Plagiarism Detection Tool for Hindi Documents
Abstract Views :190 |
PDF Views:0
Authors
Urvashi Garg
1,
Vishal Goyal
1
Affiliations
1 Punjabi University, Patiala, NH 64, Urban Estate Phase II, Patiala - 147002, Punjab, IN
1 Punjabi University, Patiala, NH 64, Urban Estate Phase II, Patiala - 147002, Punjab, IN
Source
Indian Journal of Science and Technology, Vol 9, No 12 (2016), Pagination:Abstract
Objective: The objective of this paper is to present an automated plagiarism detection software tool called Maulik. There are many plagiarism detection tools available for English text. Maulik detects plagiarism in Hindi documents. Method: Maulik divides the text into n-grams and then matches it with the text present in repository as well as with documents present online. Preprocessing techniques such as stop word removal and stemming has been used. The best value of n-gram for finding out the similarity of two Hindi documents has also been found out. Cosine similarity has been used for finding the similarity score. Findings: Similarity score of 96.3 has been achieved which is higher as compared to the existing Hindi plagiarism detection tools such as Plagiarism checker, Plagiarism finder, Plagiarisma, Dupli checker, Quetext. These tools compared only exact matches ignoring the language specific constraints whereas Maulik is capable of finding plagiarism if ischolar_main of a word is used or a word is replaced by its synonyms. Application: Maulik is a software tool which discourages plagiarism as well as motivates the writing skills of people.Keywords
Cosine Similarity, Plagiarism, Stemming, Stop Word, Synonyms- Effect of Stop Word Removal on Document Similarity for Hindi Text
Abstract Views :139 |
PDF Views:0
Authors
Urvashi Garg
1,
Vishal Goyal
2
Affiliations
1 Haryana College of Technology and Management, Kaithal, IN
2 Punjabi University, Patiala, IN
1 Haryana College of Technology and Management, Kaithal, IN
2 Punjabi University, Patiala, IN
Source
Research Cell: An International Journal of Engineering Sciences, Vol 13 (2014), Pagination: 161-163Abstract
Stop word removal is one of the important NLP techniques. Stop words are very common in any document. In this paper, we have created a list of stop words for Hindi text on the basis of frequency of words in documents. Hindi documents from EMILLE corpus have been used for finding out the stop words. UTF-8 encoding is used. The percentage of stop words in any document has been find out and experimentally analyzed. The paper discusses the effect of stop word removal on the similarity of two documents containing Hindi text. Hoad&Zobel approach is used for finding the similarity of documents containing Hindi text.Keywords
Stop Words, Removal, Text, Hindi, List, Frequency.- Plagiarism and Detection Tools:An Overview
Abstract Views :129 |
PDF Views:0
Authors
Affiliations
1 HCTM, Kaithal, IN
1 HCTM, Kaithal, IN